3D point clouds are rich in geometric structure information, while 2D images contain important and continuous texture information. Combining 2D information to achieve better 3D semantic segmentation has become mainstream in 3D scene understanding. Albeit the success, it still remains elusive how to fuse and process the cross-dimensional features from these two distinct spaces. Existing state-of-the-art usually exploit bidirectional projection methods to align the cross-dimensional features and realize both 2D & 3D semantic segmentation tasks. However, to enable bidirectional mapping, this framework often requires a symmetrical 2D-3D network structure, thus limiting the network's flexibility. Meanwhile, such dual-task settings may distract the network easily and lead to over-fitting in the 3D segmentation task. As limited by the network's inflexibility, fused features can only pass through a decoder network, which affects model performance due to insufficient depth. To alleviate these drawbacks, in this paper, we argue that despite its simplicity, projecting unidirectionally multi-view 2D deep semantic features into the 3D space aligned with 3D deep semantic features could lead to better feature fusion. On the one hand, the unidirectional projection enforces our model focused more on the core task, i.e., 3D segmentation; on the other hand, unlocking the bidirectional to unidirectional projection enables a deeper cross-domain semantic alignment and enjoys the flexibility to fuse better and complicated features from very different spaces. In joint 2D-3D approaches, our proposed method achieves superior performance on the ScanNetv2 benchmark for 3D semantic segmentation.
translated by 谷歌翻译
Over the years, Machine Learning models have been successfully employed on neuroimaging data for accurately predicting brain age. Deviations from the healthy brain aging pattern are associated to the accelerated brain aging and brain abnormalities. Hence, efficient and accurate diagnosis techniques are required for eliciting accurate brain age estimations. Several contributions have been reported in the past for this purpose, resorting to different data-driven modeling methods. Recently, deep neural networks (also referred to as deep learning) have become prevalent in manifold neuroimaging studies, including brain age estimation. In this review, we offer a comprehensive analysis of the literature related to the adoption of deep learning for brain age estimation with neuroimaging data. We detail and analyze different deep learning architectures used for this application, pausing at research works published to date quantitatively exploring their application. We also examine different brain age estimation frameworks, comparatively exposing their advantages and weaknesses. Finally, the review concludes with an outlook towards future directions that should be followed by prospective studies. The ultimate goal of this paper is to establish a common and informed reference for newcomers and experienced researchers willing to approach brain age estimation by using deep learning models
translated by 谷歌翻译
Generating consistent and high-quality images from given texts is essential for visual-language understanding. Although impressive results have been achieved in generating high-quality images, text-image consistency is still a major concern in existing GAN-based methods. Particularly, the most popular metric $R$-precision may not accurately reflect the text-image consistency, often resulting in very misleading semantics in the generated images. Albeit its significance, how to design a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric termed as Semantic Similarity Distance ($SSD$), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. Benefiting from the proposed metric, we further design the Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN) that aims at improving text-image consistency by fusing semantic information at different granularities and capturing accurate semantics. Equipped with two novel plug-and-play components: Hard-Negative Sentence Constructor and Semantic Projection, the proposed PDF-GAN can mitigate inconsistent semantics and bridge the text-image semantic gap. A series of experiments show that, as opposed to current state-of-the-art methods, our PDF-GAN can lead to significantly better text-image consistency while maintaining decent image quality on the CUB and COCO datasets.
translated by 谷歌翻译
虽然外源变量对时间序列分析的性能改善有重大影响,但在当前的连续方法中很少考虑这些序列间相关性和时间依赖性。多元时间序列的动力系统可以用复杂的未知偏微分方程(PDE)进行建模,这些方程(PDE)在科学和工程的许多学科中都起着重要作用。在本文中,我们提出了一个任意步骤预测的连续时间模型,以学习多元时间序列中的未知PDE系统,其管理方程是通过自我注意和封闭的复发神经网络参数化的。所提出的模型\下划线{变量及其对目标系列的影响。重要的是,使用特殊设计的正则化指南可以将模型简化为正则化的普通微分方程(ODE)问题,这使得可以触犯的PDE问题以获得数值解决方案,并且可行,以预测目标序列的多个未来值。广泛的实验表明,我们提出的模型可以在强大的基准中实现竞争精度:平均而言,它通过降低RMSE的$ 9.85 \%$和MAE的MAE $ 13.98 \%$的基线表现优于最佳基准,以获得任意步骤预测的MAE $。
translated by 谷歌翻译
点云上的实例分割对于3D场景的理解至关重要。距离聚类通常用于最新方法(SOTA),该方法通常是有效的,但在用相同的语义标签(尤其是在共享相邻点)的相邻对象中表现不佳。由于偏移点的分布不均匀,这些现有方法几乎不能集中所有实例点。为此,我们设计了一种新颖的鸿沟和征服策略,并提出了一个名为PBNET的端到端网络,该网络将每个点二进制并分别将它们簇簇为细分实例。 PBNET将偏移实例点分为两类:高密度点(HPS vs.lps),然后分别征服。可以通过删除LPS清楚地分离相邻的对象,然后通过通过邻居投票方法分配LP来完成和完善。为了进一步减少聚类误差,我们根据平均大小开发迭代合并算法,以汇总片段实例。 ScannETV2和S3DIS数据集的实验表明了我们的模型的优势。尤其是,PBNET在ScannETV2官方基准挑战(验证集)上实现了迄今为止最好的AP50和AP25,同时证明了高效率。
translated by 谷歌翻译
基于卷积神经网络(CNN)框架对图像支出进行了很好的研究,最近引起了计算机视觉的更多关注。但是,CNN依靠固有的电感偏见来实现有效的样品学习,这可能会降低性能上限。在本文中,以最小的变压器体系结构中的柔性自我发挥机制的启发,我们将广义图像支出问题重新构架为贴片的序列到序列自动估计问题,从而使基于查询的图像映射出现。具体而言,我们提出了一个新型混合视觉转换器基于编码器框架,名为\ textbf {query} \ textbf {o} utpainting \ textbf {trextbf {tr} ansformer(\ textbf {queryotr})围绕给定的图像。 Patch Mode的全球建模能力使我们可以从注意机制的查询角度推断图像。新颖的查询扩展模块(QEM)旨在根据编码器的输出从预测查询中整合信息,因此即使使用相对较小的数据集,也可以加速纯变压器的收敛性。为了进一步提高每个贴片之间的连接性,提议的贴片平滑模块(PSM)重新分配并平均重叠区域,从而提供无缝的预测图像。我们在实验上表明,QueryOtr可以针对最新的图像支出方法平稳和现实地产生吸引力的结果。
translated by 谷歌翻译
无监督的交叉模式医学图像适应旨在减轻不同成像方式之间的严重域间隙,而无需使用目标域标签。该活动的关键依赖于对齐源和目标域的分布。一种常见的尝试是强制两个域之间的全局对齐,但是,这忽略了致命的局部不平衡域间隙问题,即,一些具有较大域间隙的局部特征很难转移。最近,某些方法进行一致性,重点是地方区域,以提高模型学习的效率。尽管此操作可能会导致上下文中关键信息的缺陷。为了应对这一限制,我们提出了一种新的策略,以减轻医学图像的特征,即全球本地联盟的一致性,以减轻域间隙不平衡。具体而言,功能 - 触发样式转移模块首先合成类似目标的源包含图像,以减少全局域间隙。然后,集成了本地功能掩码,以通过优先考虑具有较大域间隙的判别特征来减少本地特征的“间隙”。全球和局部对齐的这种组合可以精确地将关键区域定位在分割目标中,同时保持整体语义一致性。我们进行了一系列具有两个跨模式适应任务的实验,i,e。心脏子结构和腹部多器官分割。实验结果表明,我们的方法在这两个任务中都达到了最新的性能。
translated by 谷歌翻译
我们考虑了多视图3D面部重建(MVR)的问题,该问题具有弱监督的学习,该学习利用有限数量的2D脸部图像(例如3)生成具有非常光注释的高质量3D面部模型。尽管其表现令人鼓舞,但现在的MVR方法简单地加入了多视图图像特征,而对关键区域(例如眼睛,眉毛,鼻子和嘴巴)的关注更少。为此,我们提出了一个名为Deep Fusion MVR(DF-MVR)的新型模型,并设计了具有跳过连接的单个解码框架的多视图编码,能够提取,集成和补偿深层特征,并从多视图中注意图片。此外,我们开发了一个多视图面对解析网络,以学习,识别和强调关键的共同面部领域。最后,尽管我们的模型经过了几个2D图像的训练,但即使输入一个2D图像,它也可以重建准确的3D模型。我们进行了广泛的实验,以评估各种多视图3D面部重建方法。对像素面和Bosphorus数据集的实验表明了我们的模型的优势。如果没有3D地标注释,DF-MVR分别比现有最佳弱监督的MVR在像素 - 脸和Bosphorus数据集上分别实现了5.2%和3.0%的RMSE改善;有了3D地标注释,DF-MVR在Pixel-Face数据集上的表现出色,与最佳弱监督MVR模型相比,RMSE改善13.4%。
translated by 谷歌翻译
虽然大多数当前的图像支出都进行了水平外推,但我们研究了广义图像支出问题,这些问题将视觉上下文推断出给定图像周围的全面。为此,我们开发了一个新型的基于变压器的生成对抗网络,称为U-Transformer,能够扩展具有合理结构和细节的图像边界,即使是复杂的风景图像。具体而言,我们将生成器设计为嵌入流行的Swin Transformer块的编码器到二次结构。因此,我们的新型框架可以更好地应对图像远程依赖性,这对于广义图像支出至关重要。我们另外提出了U形结构和多视图时间空间预测网络,以增强图像自我重建以及未知的零件预测。我们在实验上证明,我们提出的方法可以为针对最新图像支出方法提供广义图像支出产生可吸引人的结果。
translated by 谷歌翻译
自动核细胞分割和分类在数字病理学中起着至关重要的作用。但是,以前的作品主要基于具有有限的多样性和小尺寸的数据构建,使得在实际下游任务中的结果可疑或误导。在本文中,我们的目标是建立一种可靠且强大的方法,能够处理“临床野生”中的数据。具体地,我们研究和设计一种同时检测,分段和分类来自血红素和曙红(H&E)染色的组织病理学数据的新方法,并使用最近的最大数据集评估我们的方法:Pannuke。我们以新颖的语义关键点估计问题解决每个核的检测和分类,以确定每个核的中心点。接下来,使用动态实例分段获得核心点的相应类别 - 不可止液掩模。通过解耦两个同步具有挑战性的任务,我们的方法可以从类别感知的检测和类别不可知的细分中受益,从而导致显着的性能提升。我们展示了我们提出的核细胞分割和分类方法的卓越性能,跨越19种不同的组织类型,提供了新的基准结果。
translated by 谷歌翻译